Query: Adds ability to choose global vs local/focused statistics for FullTextScore #5582
Merged
microsoft-github-policy-service[bot] merged 8 commits intoFeb 6, 2026
Conversation
adityasa
reviewed
Jan 30, 2026
Contributor
|
Is it possible to add some emulator based e2e tests?
In both cases, the query should honor FullTextScore Scope (local v/s global). |
adityasa
reviewed
Feb 3, 2026
sc978345
previously approved these changes
Feb 4, 2026
sboshra
reviewed
Feb 4, 2026
…stics for full text search
…s for FulTextScoreScore.Local
07a86a1 to
44ba1c6
Compare
This was referenced Apr 28, 2026
Merged
Bump Microsoft.Azure.Cosmos from 3.58.0 to 3.59.0
azureossd/general-database-connectivity-samples#40
Merged
This was referenced May 5, 2026
This was referenced May 12, 2026
This was referenced May 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Enabling users to choose global vs local/focused statistics for FullTextScore
Why?
Cosmos DB’s implementation of FullTextScore computes BM25 statistics (term frequency, inverse document frequency, and document length) across all documents in the container, including all physical and logical partitions.
While this provides a valid and comprehensive representation of statistics for the entire dataset, it introduces challenges for several common use cases.
In multi-tenant scenarios, it is often necessary to isolate queries to data belonging to a specific tenant, typically defined by the partition key or a component of a hierarchical partition key. This enables scoring to reflect statistics that are accurate for that tenant’s dataset, rather than for the entire container. For customers such as Veeam and Sitecore, which operate large multi-tenant containers, this is not just an optimization but a requirement. Their tenants often operate in very different domains, which can significantly change the distribution and importance of keywords and phrases. Using global statistics in these cases leads to distorted relevance rankings.
In other scenarios involving hundreds or thousands of physical partitions, computing statistics across the entire container can become both time-consuming and expensive. Customers may prefer to use statistics derived from only a subset of partitions to improve performance and reduce RU consumption. Indeed, there is precedence for this as Azure AI Search defaults to this “local” method.
What?
We propose extending the flexibility of BM25 scoring in Cosmos DB so that developers can choose between a global FullTextScore (existing behavior) or Scoped FullTextScore (statistics computed restricted to the partition key(s) used in the query). The key aspects:
For global BM25, FullTextScore retains its existing behavior and computes BM25 statistics, such as IDF and average document length, across all documents in the container regardless of any partition key filters in the query. In scoped BM25, when a query includes a partition key filter or explicitly requests scoped scoring, the engine computes these statistics only over the subset of documents within the specified partition key values. Query results are still returned only from the filtered partitions, and the resulting scores and ranking reflect relevance within that partition-specific slice of data.
How?
The user issues query like:
And sets a new QueryRequestOption called
FullTextScoreScopewhich can be set to one of two values:localorglobal. The request option is inspected, and the query uses scoped/full stats accordingly.Type of change
Please delete options that are not relevant.